Skip to content

Image response from an MCP server#2239

Closed
stneng wants to merge 2 commits intoopenai:mainfrom
stneng:main
Closed

Image response from an MCP server#2239
stneng wants to merge 2 commits intoopenai:mainfrom
stneng:main

Conversation

@stneng
Copy link
Contributor

@stneng stneng commented Dec 26, 2025

#1898 adds support for image outputs in functions; the existing fallback logic in MCP tools below is outdated.

# Fall back to regular text content processing
# The MCP tool result is a list of content items, whereas OpenAI tool
# outputs are a single string. We'll try to convert.
if len(result.content) == 1:
tool_output = result.content[0].model_dump_json()
elif len(result.content) > 1:
tool_results = [item.model_dump(mode="json") for item in result.content]
tool_output = json.dumps(tool_results)
else:
# Empty content is a valid result (e.g., "no results found")
tool_output = "[]"

Resolves #2148 and https://community.openai.com/t/image-response-from-an-mcp-server-with-agents-sdk/1269441

@seratch
Copy link
Member

seratch commented Dec 27, 2025

Thanks for sending this. Indeed, this could work well now. I will do thorough testing early next year.

@seratch seratch added the enhancement New feature or request label Dec 27, 2025
@seratch seratch added this to the 0.7.x milestone Jan 5, 2026
@seratch seratch marked this pull request as draft January 8, 2026 07:18
@seratch
Copy link
Member

seratch commented Jan 8, 2026

I ran a few local codex reviews and this feedback seems to be valid:

  • [P1] Tracing output double-encodes MCP tool results — src/agents/mcp/util.py:250-252
    When recording MCP tool invocations, current_span.span_data.output is now set via json.dumps(tool_output) even though tool_output is already the final return value (string for structured
    content, or dict/list for text/image content). This extra dump double-encodes structured content (producing strings like "{"foo":1}") and converts dict/list outputs to JSON strings with
    different quoting than the returned value, causing trace snapshots to be incorrect for any MCP tool call. Span data should store the raw tool output instead of re-encoding it.

@stneng
Copy link
Contributor Author

stneng commented Jan 8, 2026

Thanks for the feedback, just fixed it.

@stneng stneng marked this pull request as ready for review January 8, 2026 10:00
@seratch seratch modified the milestones: 0.7.x, 0.8.x Jan 17, 2026
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 10 days with no activity.

@seratch
Copy link
Member

seratch commented Jan 28, 2026

will merge your change in #2369

@seratch seratch closed this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support returning multimodal content such as images from MCP tool calls.

2 participants